My portfolio: an introduction

Column

What is my corpus?

When the course started, we were asked to choose a corpus. Important was to find something that allowed for meaningful comparisons and contrasts, so we could answer a specific research question. This gave me an interesting idea: since 2014, I have been keeping track of all the songs that I have listened to, using a website called Last FM. Every time a track is played, on media players like Spotify or iTunes for example, a “scrobble” is recorded. This way, I have scrobbled a total of 121,587 tracks (and counting!). What better corpus to choose than a corpus that contains a large part of all the music you have ever listened to? Although at the start of this course I had never worked with an API, and had just learned intermediate skills in R in my 3d year of Psychology, it sounded like an interesting challenge. So, I started googling.

Very soon after I found out that this might become a daunting task. Collecting all my scrobbles from the Last FM API wasn’t the hard part; combining over 100,000 songs with Spotify features however, that was something I was not capable of. Luckily, I found a guide written by Andrew Walker, a researcher from the University of Florida, that included detailed instructions on how to do exactly this. Fetching the features would take the longest of the code, he said, likely up to 10-15 minutes. Obviously, for a dataset as large as mine, that was a gross underestimation. When I got the code working, I cut up the fetching process into two parts, my dataset into 5 parts, and let it all run sequentially. 6 hours of long waiting later, it was finally there: all my scrobbles and corresponding Spotify features! From this point on I knew that analyzing my corpus could lead to some very interesting results.

In this portfolio, I will try to answer one main research question: How does time influence my music listening? To answer this question, I will look at three different modes of time: 1. Hour of the day 2. Month of the year 3. Year of my life (also known as age, perhaps)

First, I will provide a visual overview of my data. Here you can find for each year all sorts of interesting descriptive statistics: how much music I’ve listened to, how my Spotify features have developed over time, and more. The most important explanatory variable being of course, time.

Then, I will conduct more detailed analyses. Certain interesting patterns emerged from my preliminary analyses, how can they be explained? Can I find more information about them in chordograms, keygrams, self-similarity matrices?

2015

Column

Total song plays for the year 2015

Unique album plays for the year 2015

Unique song plays for the year 2015

In 2015, I was still in high school, and I was listening to a lot of Mac DeMarco. I put on his music and listened all his albums through, pretty much on repeat. I have never listened to an artist so much again, which is why the music I listened to in 2014 and 2015 is still at the top of my most played. Since then, I have started listening to a lot more different music, which can also be seen from my album plays in the gauges.

Column

Column

2016

Column

Total song plays for the year 2016

Unique album plays for the year 2016

Unique song plays for the year 2016

At the start of 2016, I was in the middle of my gap year. I was spending a lot of time playing guitar and listening to music, but I was still in the early phases of discovery. I took all this into my first year of studying, where I was still listening to a lot of the music I found in the year before.

Column

Column

2017

Column

Total song plays for the year 2017

Unique album plays for the year 2017

Unique track plays for the year 2017

For me, the year 2017 got off on a strange start. I had quit studying philosophy, I was living in Amsterdam, but I had no idea what direction my life was going in. This was the point that I felt that I needed to make some bigger steps. You can see this very clearly in my music listening: the amount of different albums I had listened to has nearly tripled! It will be interesting to see if we can also find some trends in the Spotify features from this year forward.

Column

Column

2018

Column

Total song plays for the year 2018

Unique album plays for the year 2018

Unique song plays for the year 2018

2018 marked the start of something new. I started studying Psychology, and I was beginning to listen to music on a whole new level. Since I had to spend hours studying in the library, I started listening to different music as well: Boards of Canada was one of my go-to artists for studying, and has slowly become one of my favorite artists.

Column

Column

2019

Column

Total song plays for the year 2019

Unique album plays for the year 2019

Unique song plays for the year 2019

It was 2019, and things started gaining traction. I was discovering more of my would-be favorite artists: I listened to Yo La Tengo and Boards of Canada before, and came across all sorts of different nineties bands I just couldn’t seem to get around. Suddenly I was finding all sorts of electronic music I liked, which can be seen in the genre chart.

Column

Column

All Years

Column

Total songs plays from 2015 to 2019

Unique album plays from 2015 to 2019

Unique song plays from 2015 to 2019

Over the year I have listened to a very large amount of music. I know my music listening habits very well, can we also see this reflected in the data?

Column

Column

Clustering

Top 100 artists cluster


So I had downloaded this big amount of data, but where to go from there? For a while I was trying to find a way to be able group the artists based on genre. Unfortunately, Spotify doesn’t provide genre information, and professor Burgoyne told us it would be hard. This sounded like a challenge of course, so I tried to think of ways to do it anyway. With the Last FM API at hand, I found out that it was possible to fetch “tags”. Tags are a way that Last FM lets users give common labels to artists, and usually they resemble their genre quite well. So, I went ahead and wrote a script to fetch the tags for my top 100 artists. This was the easy part; how did I go from here?

I had a matrix with all the artists, and 1’s for the tags they had and 0’s for the tags they didn’t have. I got stuck here for a while, Psychology unfortunately not providing me with a lot of help at this point. It wasn’t until the last week that a classmate pointed me towards k-means clustering. I didn’t believe it at first, but it was as simple as using my matrix as input and the k-means function. The plot on the left is the result.

The most obvious cluster is the seventies “classic rock” cluster in the top (cluster 3), containing artists like Pink Floyd and Steely Dan. This is the music it all started out with for me, and I still feel like I owe much to. The main genre I’ve listened to since I started using Last FM in 2014 is indie/ psychedelic rock. These are clusters 4 and 5 you see in the bottom-left, cluster 4 being “Lo-fi, psychedelic/ indie”, and cluster 5 “Dream pop, psychedelic indie”. The blue cluster 1 on the right is the “electronica/ downtempo” cluster. Unfortunately, the last cluster 2 doesn’t make a lot of sense. These are mostly outliers. For my analyses, I put some of these artists into other clusters.

Which artists are represented most?


The cluster plot looked very promising! Judging by the plot, the artists actually seemed to be far away enough from each other for the different clusters to make sense. Judging by my own opinion, not algorithmically determined but from hours of listening, I was very pleased to see these results.

I decided to choose 5 clusters, since using more would put artists into arbitrary categories. Using less would throw the nonsense cluster into the electronica/ downtempo cluster, which would not make sense, since these artists are not very related, and the electronica/ downtempo cluster is actually quite accurate. Two clusters for indie music might seem excessive, but these findings were actually quite robust, and knowing the artists I know that seperating Lo-fi rock from Dream pop would make sense. Also, if they were in a single cluster it would constitute too much of my entire corpus to be able to compare it to the other clusters.

To get an idea for how the clusters are represented in my corpus, on the left you can see each cluster, named by genre, and the artists that had the most plays for that cluster.

From cluster to genre


In this plot, you can see how much I have listened to each genre over the years. To get a valid measure, I calculated the proportion: taking the amount of listens for a genre divided by the total listens for that year. As you can see, there are some definite changes. From 2014 to mid 2015, my final years of high school: the lo-fi, psychedelic/indie genre is decreasing, and classic rock is increasing. When I got into my gap year, I discovered Steely Dan, and started listening a lot to Electric Light Orchestra as well, artists that are represented very well in the classic rock cluster as can be seen on the previous plot. After my gap year, into my first year of University, classic rock decreased, and a small peak in lo-fi, psychedelic/indie can be seen. Still being my main genre at this point, I was discovering a lot of new music, and leaving older music (a.k.a. classic rock) behind. My taste would soon start diverging though: after meeting a very special person that was very fond of downtempo music, I wanted to listen to it as well, a lot. She has inspired me to find and listen to a lot of new music. In my years 2018 and 2019 this can be seen very well, since suddenly there is steep rise in unique album listens.

In 2020, it seems my taste has crystallized to two clusters: electronic/ downtempo and lo-fi, psychedelic/indie. I listen to a lot of music in the other genres as well, but these are more for special occasions. Time will tell what differences in music taste are still waiting for me.

The next step: finding out what Spotify features can explain the genres.

Spotify features

Features over the years


My first plan when I had my data ready was to look at the Spotify features over time, to see if interesting patterns emerge. The plot you see on the left is the date, from 2014 to 2020, plotted against a selection of the features. From the data on my front page you could already tell that my music listening has changed. This is also reflected in the features.

The biggest change over time seems to be the instrumentalness. The acousticness seems to be increasing as well, and the energy is decreasing. How can this be explained?

Different genres…


My cluster analysis seems to provide a lot of insight. On this page I provide more detail on how I conducted this. Using Last FM tags, a feature in Last FM that is a way of finding keywords that describe an artist, I managed to cluster the artists into genre quite succesfully.

As you can see, my preference for genre has changed quite a bit over time. Where I was listening to a lot of indie/lo-fi 2015 (looking at you, Mac DeMarco), I am now listening to more different genres, like electronica/ downtempo and ambient/ classical.

What is the relation between the change of genre over time and the change of features over time?

Different features


In this plot, you can see the relationship between my two main genres, and the energy and instrumentalness of the music I listen to. As the energy goes down, the amount of lo-fi/ indie music I listen to goes down. Conversely, the amount of electronica/ downtempo I listen to goes up. Also, as the instrumentalness goes up, electronica/ downtempo goes up, and lo-fi/ indie goes down. As explained previously, the electronica/ downtempo music I listen to has a higher instrumentalness and lower energy compared to lo-fi/ indie. The trend we see in this graph is therefore explained by the fact that I have been listening to a lot more electronic music.

The conclusions I draw from this, is that my music taste, but also my preference has changed over the years. As a student, my life has become busier, a lot more fun, but also a lot more exhausting. The moments I actually sit down to listen to music are the moments I like to use to wind down, and that is usually when I put on electronic music. Boards of Canada is a good example of this, and you can find an example of a song on my 2019 page.

Are these inferences true though?

Features: In-depth

Genres and features


Correlation isn’t causation, is a common phrase when people make unjust inferences. As you can see, the instrumentalness of the electronic/ downtempo genre is indeed much higher than for lo-fi indie/psychedelic. However, the energy seems to be about the same. So, the trend I described earlier, where energy descreased as downtempo music increased, might not be true after all.

What is interesting however, is the tempo. As the genre title describes, the music can be a bit… downtempo. It should be noted that the tempo is normalized to fit in the graph, but the differences remain. Since the tempo decreased over time, what can we see in a tempogram?

Tempo


For this plot, I used my top 15 songs from 2015 and 2019. Since I had to use the data I fetched from LastFM, I had a hard time getting the data right so it would work with the compmus package. I managed to get it right however, and the resulting plot is quite interesting.

Here you can see the mean tempo plotted against the SD of tempo, colour indicating tempo, size indicating song duration, and opacity indicating loudness. There seem to be quite large differences between 2015 and 2019. First of all, the range of tempo is much larger in 2019: it spans from ~70 to ~160, while in 2015 the tempo is clustered around 100. This indicates that in 2019, my music taste has become more varied, as we already saw on the first page. This can also be seen in song duration and loudness: in 2015 they seem to be similar, while in 2019 it seems to vary more.

What we also see, is that indeed, the tempo of the songs in 2019 is generally lower than in 2019. Although genre is not included in this plot, since my tempo has decreased over the years, and that has to be caused by change in music; and, since the electronica/ downtempo genre has a lower tempo, it is safe to say that these are related.

Interestingly, the standard deviation of tempo seems to increase with tempo. Does this mean that higher tempo songs also have a higher deviation in tempo? I have no clue.

Song key

Histogram of they of songs listened in 2015 and 2019


In my portfolio, I want to see how my music listening has changed over the years. To analyse this, I have a corpus that consists of the songs I have listened to since 2014. One thing that might have changed is the key of the songs that I have listened to. To analyse this, I made a histogram of all the keys of the songs in 2015, my final year of high school, and 2019, when I was halfway in my second year of Psychology.

In the histogram, you can see for every key what its proportion is to all of the keys of the songs that I listened to in 2015 and 2019. It seems that D is the most popular, and D# the least. There are slight differences in key between the years, but most noticably, it seems I am listening to far fewer songs in the key of A. Why is this?

To figure this out, I made a table of the artists I listened to in each year that wrote songs in A, and looked at the artists with the highest frequency. Not to my surprise, most songs in A were written by artist like Mac DeMarco, Beach House, Grizzly Bear, and The Black Keys, which are all alternative/ indie artists using guitars. The A chord is popular in songs written on guitar, since it can be played as an open chord, and it goes well with many other open chords. Since 2015, I have started listening to a lot less guitar-centered music, which might explain why I am also listening to fewer songs in the key of A.

Final thoughts